dplyr : data manipulation

Load tidyverse:

library( tidyverse )

General properties of survey data

  • Load the survey data in tibble format and inspect the meta information such as dimensions and column (variable) types.

  • Inspect survey data with functions such as class, str, glimpse (tidyverse) functions.
    • What is the difference between str and glimpse output?
  • Produce the following values on survey data by using the pipe operator %>%:

    • number of rows
    • number of columns
    • last 8 rows
    • first 7 rows
    • last 3 column names

Selection & Filtering

Apply the following to survey data:

  • Rename the m.i variable into system

  • Reorder the variables as such that name,age and gender come first

  • Select the last three variables

  • Deselect variables that relate to hand and/or arm (e.g. *.hnd, etc.)

Summary values survey data:

  • Retrieve distinct values for smoking habit (smokes), do the same for exercise pattern exercise.

  • Derive the frequency (count) table of smoking, do the same for exercise pattern.

  • Derive the frequency (count) table of smoking and exercise pattern.

  • How many females are there who never smoked?

  • How many right-handed heavy smokers are there, counts per gender?

  • Select teenagers.

  • What are the counts of smoking habits (smokes) in teenagers?

  • What are the counts of exercise patterns (exercise) in teenagers?

Add variables

  • Add a new column feet with heights reported in feet unit (1 foot = 30.48 cm).

  • Add a new column ‘diffHandSpan’ : the absolute difference in span of writing hand span1 and non-writing hand span2.

  • Count the number of students with smaller writing hand span.

Calculate summaries and grouping

Summary (summarise)

In survey data summarise on:

  • mean age along with total count

  • mean writing and non-writing hand span (span1,span2) .

  • mean, minimum and maximum feet (height)

Group (group_by)

  1. Summarise on average age separately for each of the following group(s):
  • gender

  • smoking habit

  • gender and smoking habit

  • exercise pattern

  • gender and exercise pattern

  1. Repeat the previous exercise with the addition of a new column with group sizes.

Sort (arrange)

Order the survey data by:

  • name

  • gender and name

  • name and gender

  • gender,name and smoking habit

  • descending order of gender, name and descending order of height

Extra exercises

  1. Calculate the following in the survey data:
  • average height

  • minimum and maximum writing hand span span1

  • minimum and maximum non-writing hand span span2

  1. In the survey data some students reported their heights in metric and some in imperial units and afterwards these were converted to metric (cm). Add a new column reportedHeight with heights reported by the student in the original unit (system or m.i). Assume inch in case of imperial (1 inch = 2.54 cm) . Hint: use ifelse to test for the system used.

Copyright © 2020 Biomedical Data Sciences | LUMC